Answer for the Assignment

Information regarding the DataSet

It contains monthly returns for five value-weighted portfolios from October 1969 to September 2019. The portfolios are formed using all stocks traded at NYSE, NASDAQ, and AMEX. A proxy for the level of operating profitability (OP) is computed for each stock. Then, the stocks are sorted according to OP and are split evenly into 5 groups (quintiles).

LO contains the lowest OP quintile stocks whereas portfolio HI contains the highest OP quintile stocks. Portfolios QNT2, QNT3, and QNT4 are formed in the obvious fashion using the remaining stocks. The provided file also contains market returns (MKT)2 and T-bill rates (RF). Note that all returns are expressed in percentage points.

Code

# setting working directory
setwd("~/Documents/Coursework/COEC371/problem_sets/MarkedAssignment")
# importing libraries
library(tidyverse)
## ── Attaching packages ─────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.2
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## ── Conflicts ────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(readxl)
## Warning: package 'readxl' was built under R version 3.5.2
library(stats)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.5.2

Importing data from VW_5_OP_Mkt_Rf_Monthly_196910_201909.xlsx

data =read_excel('VW_5_OP_Mkt_Rf_Monthly_196910_201909.xlsx')
# calculating the risk premium
data <- data %>% mutate(RP = (MKT) - RF)
summary(data)
##       Date              LO                QNT2              QNT3         
##  Min.   :196910   Min.   :-28.4700   Min.   :-21.970   Min.   :-21.0900  
##  1st Qu.:198204   1st Qu.: -2.4450   1st Qu.: -1.627   1st Qu.: -1.5675  
##  Median :199410   Median :  1.0450   Median :  1.245   Median :  1.2450  
##  Mean   :199432   Mean   :  0.7295   Mean   :  0.884   Mean   :  0.9728  
##  3rd Qu.:200703   3rd Qu.:  4.1500   3rd Qu.:  3.817   3rd Qu.:  3.5200  
##  Max.   :201909   Max.   : 17.6200   Max.   : 18.680   Max.   : 17.5900  
##       QNT4                HI               MKT                 RF        
##  Min.   :-20.2400   Min.   :-24.000   Min.   :-22.6400   Min.   :0.0000  
##  1st Qu.: -1.6125   1st Qu.: -1.532   1st Qu.: -1.7500   1st Qu.:0.1400  
##  Median :  1.2600   Median :  1.195   Median :  1.2650   Median :0.4000  
##  Mean   :  0.9658   Mean   :  1.031   Mean   :  0.9275   Mean   :0.3812  
##  3rd Qu.:  3.8175   3rd Qu.:  3.922   3rd Qu.:  3.8975   3rd Qu.:0.5400  
##  Max.   : 18.6500   Max.   : 16.880   Max.   : 16.6100   Max.   :1.3500  
##        RP          
##  Min.   :-23.2400  
##  1st Qu.: -2.0250  
##  Median :  0.9200  
##  Mean   :  0.5463  
##  3rd Qu.:  3.5050  
##  Max.   : 16.1000
str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    600 obs. of  9 variables:
##  $ Date: num  196910 196911 196912 197001 197002 ...
##  $ LO  : num  6.25 -4.77 -2.4 -6.84 7.26 ...
##  $ QNT2: num  4.61 -2.98 -3.47 -7.13 8.54 0.64 -9.37 -4.67 -5.93 8.11 ...
##  $ QNT3: num  4.49 -4.18 -1.3 -7.88 5.32 0.12 -9.56 -4.51 -4.33 10.2 ...
##  $ QNT4: num  6.26 -3.74 -2.07 -7.15 5.01 ...
##  $ HI  : num  6.21 -2.17 -0.33 -7.91 4.08 ...
##  $ MKT : num  5.66 -3.27 -1.99 -7.5 5.75 -0.49 -10.5 -6.39 -5.21 7.45 ...
##  $ RF  : num  0.6 0.52 0.64 0.6 0.62 0.57 0.5 0.53 0.58 0.52 ...
##  $ RP  : num  5.06 -3.79 -2.63 -8.1 5.13 -1.06 -11 -6.92 -5.79 6.93 ...

Part A

to perform Regression Analysis for all five OP portfolios. regress portfolio excess returns on market excess returns.

What are the alpha estimates for the 5 OP portfolios and the standard errors of these estimates?

# attempt to regress 
reg1 = data %>% lm(formula = (LO-RF) ~ RP)
reg2 = data %>% lm(formula = (QNT2-RF) ~ RP)
reg3 = data %>% lm(formula = ((QNT3)-RF) ~ RP)
reg4 = data %>% lm(formula = ((QNT4)-RF) ~ RP)
reg5 = data %>% lm(formula = ((HI)-RF) ~ RP)

#create a table
stock <- c('LO','QNT2','QNT3','QNT4','HI')
betas <- c(reg1$coefficients[2],
        reg2$coefficients[2],
        reg3$coefficients[2],
        reg4$coefficients[2],
        reg5$coefficients[2])
alphas <- c(reg1$coefficients[1],
        reg2$coefficients[1],
        reg3$coefficients[1],
        reg4$coefficients[1],
        reg5$coefficients[1])
df <- tibble(stock,alphas,betas)
df
## # A tibble: 5 x 3
##   stock  alphas betas
##   <chr>   <dbl> <dbl>
## 1 LO    -0.299  1.19 
## 2 QNT2  -0.0371 0.988
## 3 QNT3   0.0699 0.955
## 4 QNT4   0.0511 0.977
## 5 HI     0.125  0.960

according betas, LO should have the highest return and QNT3 should have the lowest.

Which portfolios are fairly priced, overpriced, or underpriced from a CAPM perspective? [Hint: For which portfolios are the alpha estimates statistically significant?]

just to see how the data distribution looks

# density plots
plot(density(data$LO))

plot(density(data$QNT2))

plot(density(data$QNT3))

plot(density(data$QNT4))

plot(density(data$HI))

### Plot SML, as predicted by the CAPM

# plot for part f
plot1 <- data %>% ggplot( aes(y = (LO-RF), x = RP*12))+geom_point(col='yellow4') + xlab('Market Excess Return') + ylab('Portfolio Excess Return') + ggtitle('Does CAPM hold for every portfolio?') + geom_smooth(method='lm')+geom_point(aes(y = (QNT2-RF)),col='yellow3')+geom_point(aes(y=(QNT3-RF)),col='yellow2')+geom_point(aes(y=(QNT4-RF)),col='yellow1')+geom_point(aes(y=(HI-RF)),col='lightyellow')+theme_classic()
plot1 <- ggplotly(plot1)
plot1

On a new graph, plot the CAPM-predicted risk premium (y-axis) vs. the realized excess return (x-axis) for each portfolio. Note that (i) the CAPM-predicted risk premium is given by beta times average market excess return and (ii) the realized risk premium is equal to average excess return. Use annualized returns for easier interpre- tation. What should the plot look like if the CAPM holds? What can you say about the validity of the CAPM from this graph? [Hint: plot the 45-degree line on the same graph.]

CAPM <- data %>% ggplot( aes(y = (reg1$coefficients[2]*mean(RP*12)), x = mean(LO)))+geom_point(col='yellow4')+geom_abline(intercept = 0,slope = 1,color='red',linetype='dashed',size=1.5)+ xlab('Realised Excess Return') + ylab('CAPM - predicted risk premium') + ggtitle('Does CAPM hold for every portfolio?') +geom_point(aes(y = (reg2$coefficients[2]*mean(RP*12)), x = mean(QNT2)),col='yellow3')+geom_point(aes(y = (reg3$coefficients[2]*mean(RP*12)), x = mean(QNT3)),col='yellow2')+geom_point(aes(y = (reg4$coefficients[2]*mean(RP*12)), x = mean(QNT4)),col='yellow1')+geom_point(aes(y = (reg5$coefficients[2]*mean(RP*12)), x = mean(LO)),col='yellow')
CAPM <- ggplotly(CAPM)
CAPM

# PARTB initial data transformation

# expected values
E_LO = mean(data$LO)
E_HI = mean(data$HI)

# risk
sd_LO = sd(data$LO)
sd_HI = sd(data$HI)

# correlation
HILO = cor(data$LO,data$HI)

#table
answer = tibble(E_LO,E_HI,sd_LO,sd_HI,HILO)
answer
## # A tibble: 1 x 5
##    E_LO  E_HI sd_LO sd_HI  HILO
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.730  1.03  5.71  4.48 0.846
annualised_er = answer[1:2]*12
annualised_sd = answer[3:4]*sqrt(12)

annualised_er
##     E_LO    E_HI
## 1 8.7544 12.3698
annualised_sd
##      sd_LO    sd_HI
## 1 19.76415 15.51988
#correlation 
HILO
## [1] 0.8462735
# Sharpe ratio calculation
SharpeRatio_LO = annualised_er/annualised_sd
SharpeRatio_LO = rename(SharpeRatio_LO, SR_LO = E_LO,SR_HI = E_HI)
SharpeRatio_LO
##       SR_LO     SR_HI
## 1 0.4429434 0.7970295

As can be seen from the output, HI provides higher return for the given risk.

# creating weights
weights <-seq(from =-1.5,to = 1.5,length.out=30)

# data table with the weights
df_w <- tibble(wLO = weights, 
               wHI = (1-weights))

# table with er and sd for the given wieghts 
B_df <- df_w %>% mutate(er_P = wLO*E_LO*12 + wHI*E_HI*12,sd_P = sqrt((wLO*sd_LO*sqrt(12))^2 + (wHI*sd_HI*sqrt(12))^2 + 2*wLO*wHI*HILO))
B_df
## # A tibble: 30 x 4
##       wLO   wHI  er_P  sd_P
##     <dbl> <dbl> <dbl> <dbl>
##  1 -1.5    2.5   17.8  48.8
##  2 -1.40   2.40  17.4  46.3
##  3 -1.29   2.29  17.0  43.8
##  4 -1.19   2.19  16.7  41.3
##  5 -1.09   2.09  16.3  38.8
##  6 -0.983  1.98  15.9  36.3
##  7 -0.879  1.88  15.5  33.9
##  8 -0.776  1.78  15.2  31.5
##  9 -0.672  1.67  14.8  29.1
## 10 -0.569  1.57  14.4  26.8
## # … with 20 more rows
# top 6 entries maximum Portfolio SD
head(arrange(B_df,desc(sd_P)))
## # A tibble: 6 x 4
##      wLO   wHI  er_P  sd_P
##    <dbl> <dbl> <dbl> <dbl>
## 1 -1.5    2.5   17.8  48.8
## 2 -1.40   2.40  17.4  46.3
## 3 -1.29   2.29  17.0  43.8
## 4 -1.19   2.19  16.7  41.3
## 5 -1.09   2.09  16.3  38.8
## 6 -0.983  1.98  15.9  36.3

COMPLETE PORTFOLIO

recheck part d

#weights
cp_weights <-seq(from =0,to = 1,length.out=30)

# data table with weights
df_cp <- tibble(wRF = cp_weights, 
               wP = (1-cp_weights))
# table with cp
cp_df <- B_df %>% mutate(er_CP = df_cp$wRF*0.03 + df_cp$wP*er_P,sd_CP = (df_cp$wP*sd_P))
cp_df
## # A tibble: 30 x 6
##       wLO   wHI  er_P  sd_P er_CP sd_CP
##     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 -1.5    2.5   17.8  48.8 17.8   48.8
##  2 -1.40   2.40  17.4  46.3 16.8   44.7
##  3 -1.29   2.29  17.0  43.8 15.9   40.7
##  4 -1.19   2.19  16.7  41.3 14.9   37.0
##  5 -1.09   2.09  16.3  38.8 14.1   33.4
##  6 -0.983  1.98  15.9  36.3 13.2   30.1
##  7 -0.879  1.88  15.5  33.9 12.3   26.9
##  8 -0.776  1.78  15.2  31.5 11.5   23.9
##  9 -0.672  1.67  14.8  29.1 10.7   21.1
## 10 -0.569  1.57  14.4  26.8  9.96  18.5
## # … with 20 more rows
# plot
ggplot()+geom_point(data = B_df,aes(y=er_P,x=sd_P))+geom_point(data = tibble(sd =c(sd_LO*sqrt(12),sd_HI*sqrt(12)),mean = c(E_LO*12,E_HI*12)),aes(x=sd,y=mean),color = "red",size=3,shape=18)+ggtitle("Portfolio returns",subtitle = "with LO and HI")+xlab("standard deviation")+ylab("expected returns")+scale_y_continuous()+scale_x_continuous()+geom_hline(yintercept = 3)+ylim(0,15)+geom_line(data = cp_df,aes(y=er_CP,x=sd_CP),color = 'red')
## Scale for 'y' is already present. Adding another scale for 'y', which
## will replace the existing scale.
## Warning: Removed 8 rows containing missing values (geom_point).
## Warning: Removed 3 rows containing missing values (geom_path).

### tangent portfolio weights

numerator = ((mean(E_LO*12)-3)*sd_HI -(mean(E_HI*12)-3))*HILO
denominator = (mean(E_LO*12)-3)*sd_HI +(mean(E_HI*12)-0.03)*sd_LO-(mean(E_HI*12)-3+mean(E_LO*12)-3)*HILO
cw_LO = numerator/denominator
cw_LO
## [1] 0.1665551
cw_HI = 1-cw_LO
# lets compute the er and sd
er_cp = cw_LO*E_LO*12 + cw_HI*E_HI*12
sd_cp = sqrt((cw_LO*sd_LO*sqrt(12))^2 + (cw_HI*sd_HI*sqrt(12))^2 + 2*cw_LO*cw_HI*HILO)
# sharpe ratio
sharpe_cp = (er_cp - 3)/sd_cp
#output
er_cp
## [1] 11.76764
sd_cp
## [1] 13.35606
sharpe_cp
## [1] 0.6564539

now that we have formed the tangent line, and figured out the tangent portfolio. now we will try to find the complete portfolio, with 12% expected return

# expected returns and the sd for the tangent portfolio
e_tp = df_cp$wP*er_cp + df_cp$wRF*3
sd_tp = df_cp$wP*sd_cp
final_df = tibble(e_tp,sd_tp)
max(final_df$e_tp)
## [1] 11.76764

doubt dont know how to exactly get the last answer, according to my tangent portfolio return, my max is 11.76764, hence getting 12 is impossible, close to 12, which is 11.76764 i am getting 0% weight on the rf asset, hence providing me with the tangent portfolio with no shorting capabiities. Plot below

test <- ggplot()+geom_point(data = B_df,aes(y=er_P,x=sd_P))+geom_point(data = tibble(sd =c(sd_LO*sqrt(12),sd_HI*sqrt(12)),mean = c(E_LO*12,E_HI*12)),aes(x=sd,y=mean),color = "red",size=3,shape=18)+ggtitle("Portfolio returns",subtitle = "with LO and HI")+xlab("standard deviation")+ylab("expected returns")+scale_y_continuous()+scale_x_continuous()+geom_hline(yintercept = 3)+geom_line(data = final_df,aes(y=e_tp,x=sd_tp),color='blue')
test <- ggplotly(test)
test